Abstract: Cloud computing provides massive computation power and storage capacity which enable users to deploy computation and data intensive applications without infrastructure investment. Besides the processing of such applications, a large volume of intermediate datasets will be generated, and frequently stored to save the cost of re-computing them. Though, preserving the privacy of intermediate datasets becomes a challenging problem because adversaries may recover privacy-sensitive information by analysing multiple intermediate datasets. Encrypting all datasets in cloud is widely approved in existing approaches to tackle this challenge. But we dispute that encrypting all intermediate datasets are neither competent nor cost-effective because it is very time overwhelming and costly for data-intensive applications to en/decrypt datasets frequently while performing any operation on them. This paper, proposes a novel upper-bound privacy leakage constraint based approach to identify which intermediate datasets need to be encrypted and which do not, so that privacy-preserving cost can be saved while the privacy requirements of data holders can still be satisfied. Evaluation results exhibit that the privacy-preserving cost of intermediate datasets can be significantly reduced with our approach over existing ones where all datasets are encrypted.
Keywords: Cloud computing, data storage privacy, privacy preserving, intermediate data set, privacy upper bound